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Адепаа 


* Mistakes ме ме made as Developers and how we've fixed them 
* Mistakes we've made as Operations and how we've fixed them 


About me 


• | ат not a ОВА [| 


• | ат not even а DB Reliability Engineer (yes, that's a job!) 
| am a SRE with a backend SWE background, but | manage this: 


About Gorgias (Hi Boss!) 


• Sales Pitch: 

* We build an integrated helpdesk for e-commerce brands, making 
it easy to deliver personalized support and automation across 
multiple channels. Connect all your business and social apps, and 
turn customer support into a revenue-generating activity! 


* Gorgias empowers 10,000+ online merchants like SteveMadden, 
PrincessPolly or MarineLayer to provide the best possible 
experience to their customers. 


* ПОК: Provide supercharged mailboxes for online merchants, as 
5aa5 


In the beginning 
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Using пијани for Full Text Search 
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Issue 1: PG Sessions in idle-in-tx 


* PG connections are expensive in CPU and memory 

e max connections in postgresgl.conf set to a few hundreds 

e With PGBouncer, the same PG connection can be used by 
several clients ... but not at the same time! 


4 client 2 server 
connections connections 
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• Connections in the middle of a TX can't be reused 
• Problem: what If the client (I.e the app) forgets or takes a long 
time to commit/rollback ? At 3000 TX/s => quick connection 


starvation 


Idie-in-tx (2) 


е Typically happens when 


BEGIN 
SELECT * from messages; 
For each messages { 
heavy Python processing and/or slow network calls 


} 
COMMIT 


• Solutions? 

1. Fix the application code: not so easy 

2. Set idle in transaction session timeout = "10s" (ALTER SYSTEM| 
DATABASE|USER) 


Issue 2: Vacuuming Large Tables 


e Vacuum 15 a maintenance operation that removes dead rows from the table and index files. 
e Vacuum is critical to maintain consistent query performance 


Vacuum operates in 3 phases: 

. Scan the table: look for the dead rows and save their "location" in a in- memory buffer 

. Cleanup indexes: delete elements that point to the location of dead rows (using the above 
buffer) 

. Cleanup the table files by making the dead rows' location available for reuse 
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Strategy for large tables: better pay a little frequently than a lot later on. 
Reduce autovacuum vacuum scale factor 

Reduce autovacuum vacuum cost delay 

Increase maintenance work mem 

Increase max parallel maintenance workers 

Proactively run vacuum nightly 


Table Partitioning 


"ticket 001" table 
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"ticket 007" table 
Hash(MerchantID) 
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e Тһе partition key and the number of partitions (here 128) are critical to get right. 
e How did we do it “live” ? 


"Live" Table 


Partitioning 


Debezium backfills the new partitioned 
table and uses logical replication to 
keep the 2 copies in sync 


Short "downtime" and 
table rename/swap 


"Tickets part 
" Table 


Logical "Tickets" 
Replication Table 


Note: no App code change, no double write 


Issue 3: Locks 


ALTER TABLE 


Ө diffe rent table-level lOC Ко TP NER 
Devs regularly need to make schema changes за” 
Some recipes аге well-known: create/drop index uoc p bb M 


ALTER TABLE | IF EXISTS ] [ ONLY ] name [ “ | 


concurrently, add NOT NULL field with а DEFAULT seme im esito rme ter 


RENAME CONSTRAINT constraint name TO new constraint name 


Some less so: adding a FK constraint in 2 steps ALTER TABLE СТР EXISTS 1 nane 
with NOT VALID then VALIDATE CONSTRAINT С SET зоном ew schena 


ALTER TABLE ALL IN TABLESPACE name | OWNED BY role name |, ... 1 1 
SET TABLESPACE new_tablespace [ NOWAIT ] 
ALTER TABLE [ IF EXISTS ] name 
ATTACH PARTITION partition_name { FOR VALUES partition_bound_spec | DEFAULT } 
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1. Cl Job with а | inter ? where action is one of: 
2. leaching and Code reviews uec ccu ee 


ALTER 


READ/WRITE to a single table ue 


44 и и 77 и и 44 77 ALTER | COLUMN ] column name SET COMPRESSION compression method 
=> Run "migration" with lock timeout=“10s до table constraint L кот VALID 1 
= ADD table constraint using index 
ALTER CONSTRAINT constraint name [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ] 
VALIDATE CONSTRAINT constraint name 
DROP CONSTRAINT [ IF EXISTS ] constraint name [ RESTRICT | CASCADE ] 
DISABLE TRIGGER [ trigger name | ALL | USER ] 
ENABLE TRIGGER [ trigger name | ALL | USER ] 
ENABLE REPLICA TRIGGER trigger name 
ENABLE ALWAYS TRIGGER trigger name 
DISABLE RULE rewrite rule name 
ENABLE RULE rewrite rule name 
ENABLE REPLICA RULE rewrite rule name 
ENABLE ALWAYS RULE rewrite rule name 
DISABLE ROW LEVEL SECURITY 


COLUMN 
COLUMN 
COLUMN 


column name SET ( attribute option = value |, ... |) 
column name RESET ( attribute option |, ... 1) 
column name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } 


ALTER [ COLUMN ] column name [ SET DATA ] TYPE data type [ COLLATE collation ] [ USING expression ] 
ALTER [ COLUMN ] column name SET DEFAULT expression 
ALTER [ COLUMN ] column name DROP DEFAULT 
ALTER [ COLUMN ] column name ( SET | DROP } NOT NULL 
: и ALTER [ COLUMN ] column name DROP EXPRESSION | IF EXISTS | 
N OL е П О u 9 а : a ve ry S а О rt AC C e SS ЕХС | Uu S | ме | ОС А С а П ALTER [ COLUMN 1 column пате ADD GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY | ( sequence options ) | 
ALTER [ COLUMN ] column пате { SET GENERATED { ALWAYS | BY DEFAULT } | SET sequence option | RESTART | | WITH | restart ] } [...] 
i f | S F LECT е bl K | | ALTER [ COLUMN 1 column name DROP IDENTITY [ IF EXISTS ] 
Walt тог a long statement an OCK a ALTER [ COLUM } colum name SET STATISTICS integer 
[ ] 
[ ] 
[ ] 


Issue 4: Forced Checkpoints 


* What are checkpoints again? 
e Checkpoints create а lot of IO disk writes 
e Balance between “WAL file not too big" (recovery time) and "saving IOPS" 
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• Г the current WAL file gets bigger than "max wal size", a checkpoint will be 
forced before checkpoint time has elapsed. 


Checkpoint tuning 


1. Increase "checkpoint timeout" from 5min (default) to 15min 
2. Measure how much WAL grows in 15min 


postgres@pg-main-0:/$ psql -c "SELECT pg current wal insert lsn();" && sleep $((15*60)) 48 psql -c "SELECT pg current wal insert lsn();" 
pg current wal insert lsn 


4646/05865988 
(1 гом) 


pg current wal insert |зп 


4647/831335A8 
(1 row) 


postgres@pg-main-0:/$ psql -c "SELECT pg size pretty(pg wal lsn diff('4647/831335A8','4646/D5865988'));" 
pg size pretty 


2777 MB 
(1 row) 


4.  Setthat as "max wal size" value (maybe add 20% to be safe) 


5.  Forcefully kill PG (SIGKILL) and measure ите to recovery 


:32:36.021 UTC [13] LOG: database system was not р гу shut ; automatic recovery in progress 
:32:36.035 UTC [13] LOG: redo starts at 22/3E5CB440 ——————————— 
:32:36.036 UTC [13] LOG: invalid record Length at 22/3E5E7530: wanted 24, got 0 

:32:36.036 UTC [13] LOG: redo done at 22/3E5E7AF8 system usage: CPU: user: 0.00 s, system: 0.00 s, е 


:32:36.060 UTC [13] LOG: checkpoint starting: end-of-recovery immediate 

›:32:36.447 UTC [13] LOG: checkpoint complete: wrote 60 buffers (0.0%); © WAL file(s) added, 0 removec 
, average-0.005 s; distance-112 kB, estimate-112 kB 
15:32:36.461 UTC [1] LOG: database system is ready to accept connections ‚ де 


6. If “too long” reduce checkpoint timeout and max wal size 


Armenia 


Issue 5: large JSONB Columns 


• JSONB type is great ! 


* PGcould feel like a document store with ACID guaranties 
* BUT! 
" No column statistics 
* Large storage footprint: 
* Duplicated key names 


e Out-of-line storage (a.k.a TOAST) 
* No Partial updates 
• № Partial read 


Issue 6: Running PG т Kubernetes 


* The setup (per shard): 
• 1 StatefulSet for the PG primary with exactly 1 replica 
• 1 StatefulSet for the PG Hot Standbys with М replicas 
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Conciusion 


• Start with PG for ЈОМ documents, for time series, for analytics, etc. 

• You can до a long way with 1 primary and 1+ RO replicas. (ana 51266 of RAM) 
* Get help and more specialized data store later on... but not too late! 

• Debezium and СОС is a Swiss army knife 


Leave your feedback! 


You can rate the talk and 
give a feedback on what 
you've liked or what 
could be improved 


Yandex 


